Make lower and upper emit Utf8View for Utf8View input by kumarUjjawal · Pull Request #20616 · apache/datafusion

kumarUjjawal · 2026-02-28T13:36:35Z

Which issue does this PR close?

Part of Update string UDF's to have return type == input type where appropriate #20585

Rationale for this change

String UDFs should preserve string representation where feasible. lower and upper previously accepted Utf8View input but emitted Utf8, causing an unnecessary type downgrade. This aligns both with the expected behavior of returning the same string type as its primary input.

What changes are included in this PR?

Updated lower and upper return type inference to emit Utf8View when input is Utf8View

Are these changes tested?

Yes

Are there any user-facing changes?

neilconway

Thanks for looking at this!

The PR description has a typo (still talks about repeat).

Personally I find it a bit confusing to have the same / overlapping changes in concurrent PRs (e.g., for upper and lower).

neilconway · 2026-02-28T14:45:44Z

datafusion/functions/src/string/common.rs

+}
+
+#[derive(Debug, Clone, Copy)]
+enum Utf8ViewOutput {


Why do we need a new enum -- can't we just pass the return data type the caller expects?

neilconway · 2026-02-28T14:49:01Z

datafusion/functions/src/string/lower.rs

+        if arg_types[0] == DataType::Utf8View {
+            Ok(DataType::Utf8View)
+        } else {
+            utf8_to_str_type(&arg_types[0], "lower")


I wonder if it would be helpful to have a helper here that handles all three string representations and returns the optimal return type. Having a helper for Utf8 vs. LargeUtf8 but handling Utf8View at the call-site seems a bit odd.

neilconway · 2026-02-28T14:56:39Z

datafusion/functions/src/string/common.rs

-                    } else {
-                        string_builder.append_null();
+                match utf8view_output {
+                    Utf8ViewOutput::Utf8 => {


I'm confused why we need to add code to handle the "Utf8View input, Utf8 output" case -- seems like we don't actually want that behavior. If we fixed up upper and lower as part of the same PR, couldn't we just have case_conversion return a value of the same type as its input? That would avoid this redundant code and also get rid of the utf8view_output parameter.

kumarUjjawal · 2026-03-02T08:14:14Z

Thanks for looking at this!

The PR description has a typo (still talks about repeat).

Personally I find it a bit confusing to have the same / overlapping changes in concurrent PRs (e.g., for upper and lower).

Thanks for the feedbak. I agree I should have done both lower and upper in the same pr, that makes much more sense. I'm working on it, will update the pr and close the other one. Initially I had thougth about it I don't remember why I didn't do it then.

neilconway · 2026-03-02T13:48:24Z

datafusion/functions/src/string/lower.rs

+    }
+
+    #[test]
+    fn lower_return_type_dictionary_utf8view() -> Result<()> {


For symmetry, should we add an analogous test case for upper?

neilconway · 2026-03-02T13:49:36Z

datafusion/functions/src/string/common.rs

    }
 }

+pub(crate) fn case_conversion_return_type(


This doesn't seem specific to case conversion; we probably need a more generic facility for finding the return return type for a function that takes a string. But we can handle that later.

neilconway · 2026-03-02T14:02:34Z

datafusion/functions/src/string/common.rs

+    name: &str,
+) -> Result<DataType> {
+    match arg_type {
+        DataType::Utf8View | DataType::BinaryView => Ok(DataType::Utf8View),


I believe BinaryView will be rejected by the type signatures for these functions, so probably clearer if we don't handle it.

neilconway · 2026-03-02T14:04:33Z

datafusion/sqllogictest/test_files/functions.slt

+LargeUtf8
+
+query T
+SELECT arrow_typeof(upper(arrow_cast('foo', 'Utf8View')))


Add SLT tests for dictionary with a Utf8View value type?

Make lower emit Utf8View for Utf8View input

837f896

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Feb 28, 2026

neilconway reviewed Feb 28, 2026

View reviewed changes

kumarUjjawal added 2 commits March 2, 2026 14:53

Make lower and upper emit Utf8View for Utf8View input

d92a711

Handle dictionary Utf8View return type in case conversion

d553ac1

kumarUjjawal changed the title ~~Make lower emit Utf8View for Utf8View input~~ Make lower and upper emit Utf8View for Utf8View input Mar 2, 2026

kumarUjjawal mentioned this pull request Mar 2, 2026

Make upper emit Utf8View for Utf8View input #20615

Closed

neilconway reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make lower and upper emit Utf8View for Utf8View input#20616

Make lower and upper emit Utf8View for Utf8View input#20616
kumarUjjawal wants to merge 3 commits intoapache:mainfrom
kumarUjjawal:refactor/lower_utf8view

kumarUjjawal commented Feb 28, 2026 •

edited

Loading

Uh oh!

neilconway left a comment

Uh oh!

neilconway Feb 28, 2026

Uh oh!

neilconway Feb 28, 2026

Uh oh!

neilconway Feb 28, 2026

Uh oh!

kumarUjjawal commented Mar 2, 2026

Uh oh!

neilconway Mar 2, 2026

Uh oh!

neilconway Mar 2, 2026

Uh oh!

neilconway Mar 2, 2026

Uh oh!

neilconway Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kumarUjjawal commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway left a comment

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal commented Mar 2, 2026

Uh oh!

neilconway Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kumarUjjawal commented Feb 28, 2026 •

edited

Loading